Locating Matching Method Calls by Mining Revision History Data

نویسندگان

Benjamin Livshits

Thomas Zimmermann

چکیده

Developing an appropriate fix for a software bug often requires a detailed examination of the code as well as generation of appropriate test cases. However, certain categories of bugs are usually easy to fix. In this paper we focus on bugs that can be corrected with a one-line code change. As it turns out, one-line source code changes very often represent bug fixes. Moreover, a significant fraction of previously known bug categories can be addressed with one-line fixes. Careless use of file manipulation routines, failing to call free to deallocate a data structure, failing to use strncpy instead of strcpy for safer string manipulation, and using tainted character arrays as the format argument of fprintf calls are all well-known types of bugs that can typically be corrected with a one-line change of the program source. This paper proposes an analysis of software revision histories to find highly correlated pairs of method calls that naturally form application-specific useful coding patterns. Potential patterns discovered through revision history mining are passed to a runtime analysis tool that looks for pattern violations. We focus our pattern discovery efforts on matching method pairs. Matching pairs such as 〈fopen, fclose〉, 〈malloc, free〉, as well as 〈lock, unlock〉-function calls require exact matching: failing to call the second function in the pair or calling one of the two functions twice in a row is an error. We use common bug fixes as a heuristic that allows us to focus on patterns that caused bugs in the past. The user is presented with a choice of patterns to validate at runtime. Dynamically obtained information about which patterns were violated and which ones held at runtime is presented to the user. This combination of revision history mining and dynamic analysis techniques proves effective for both discovering new application-specific patterns and for finding errors when applied to very large programs with many man-years of development and debugging effort behind them. To validate our approach, we analyzed Eclipse, a widelyused, mature Java application consisting of more than 2,900,000 lines of code. By mining revision histories, we have discovered a total of 32 previously unknown highly application-specific matching method pairs. Out of these, 10 were dynamically confirmed as valid patterns and a total of 107 previously unknown bugs were found as a result of pattern violations. The first author was supported in part by the National Science Foundation under Grant No. 0326227. The second author was supported by the Graduiertenkolleg “Leistungsgarantien für Rechnersysteme”.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Semi-Analytical Method for History Matching and Improving Geological Models of Layered Reservoirs: CGM Analytical Method

History matching is used to constrain flow simulations and reduce uncertainty in forecasts. In this work, we revisited some fundamental engineering tools for predicting waterflooding behavior to better understand the flaws in our simulation and thus find some models which are more accurate with better matching. The Craig-Geffen-Morse (CGM) analytical method was used to predict recovery performa...

متن کامل

Data Mining Revision Controlled Document History Metadata for Automatic Classification

متن کامل

Using Historical Data From Source Code Revision Histories to Detect Source Code Properties

Title of Document: USING HISTORICAL DATA FROM SOURCE CODE REVISION HISTORIES TO DETECT SOURCE CODE PROPERTIES Chadd Creighton Williams, Doctor of Philosophy, 2006 Directed By: Professor Jeffrey K. Hollingsworth, Department of Computer Science In this dissertation, we describe several techniques for using historical data mined from the source code revision histories of software projects to deter...

متن کامل

The ROAD from Sensor Data to Process Instances via Interaction Mining

Process mining is a rapidly developing field that aims at automated modeling of business processes based on data coming from event logs. In recent years, advances in tracking technologies, e.g., RealTime Locating Systems (RTLS), put forward the ability to log business process events as location sensor data. To apply process mining techniques to such sensor data, one needs to overcome an abstrac...

متن کامل

Multiple Sequence Alignment for Characterizing the Lineal Structure of Revision

We present a first approach to the application of a data mining technique, Multiple Sequence Alignment, to the systematization of a polemic aspect of discourse, namely, the expression of contrast, concession, counterargument and semantically similar discursive relations. The representation of the phenomena under study is carried out by very simple techniques, mostly pattern-matching, but the re...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2005

Locating Matching Method Calls by Mining Revision History Data

نویسندگان

چکیده

منابع مشابه

A Semi-Analytical Method for History Matching and Improving Geological Models of Layered Reservoirs: CGM Analytical Method

Data Mining Revision Controlled Document History Metadata for Automatic Classification

Using Historical Data From Source Code Revision Histories to Detect Source Code Properties

The ROAD from Sensor Data to Process Instances via Interaction Mining

Multiple Sequence Alignment for Characterizing the Lineal Structure of Revision

عنوان ژورنال:

اشتراک گذاری